Search CORE

7 research outputs found

An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

Author: A Kemmar
B Negrevergne
G Perez
NR Mabroukeh
T Guns
Publication venue
Publication date: 01/01/2016
Field of study

The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin

arXiv.org e-Print Archive

Crossref

DIAL UCLouvain

An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

Author: A Kemmar
B Negrevergne
G Perez
NR Mabroukeh
T Guns
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc.). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki’s cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which a symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions. The data and software related to this paper are available at http://sites.uclouvain.be/cp4dm/spm/

Crossref

DIAL UCLouvain

A Global Constraint for Mining Sequential Patterns with {GAP} Constraint

Author: A Kemmar
B Negrevergne
C Li
G Yang
MJ Zaki
P Fournier-Viger
X Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/11/2015
Field of study

International audienc

HAL - Normandie Université

arXiv.org e-Print Archive

Crossref

HAL-Paris 13

Mining Time-constrained Sequential Patterns with Constraint Programming

Author: A Kemmar
A Kemmar
C Antunes
H Mannila
J Han
J Pei
J Wang
John O. R. Aoga
JOR Aoga
N Beldiceanu
NAK Desai
Pierre Schaus
R Henriques
R Henriques
S Kadioglu
T Guns
Tias Guns
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

User's Constraints in Itemset Mining

Author: A Kemmar
H Mannila
M Khiari
M Wojciechowski
N Lazaar
P Schaus
T Guns
T Guns
T Uno
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/08/2018
Field of study

International audienceDiscovering significant itemsets is one of the fundamental tasks in data mining. It has recently been shown that constraint programming is a flexible way to tackle data mining tasks. With a constraint programming approach, we can easily express and efficiently answer queries with user’s constraints on itemsets. However, in many practical cases queries also involve user’s constraints on the dataset itself. For instance, in a dataset of purchases, the user may want to know which itemset is frequent and the day at which it is frequent. This paper presents a general constraint programming model able to handle any kind of query on the dataset for itemset mining

Crossref

HAL Descartes

HAL-Artois

EpisodeSupport: A Global Constraint for Mining Frequent Patterns in a Long Sequence of Events

Author: A Kemmar
B Negrevergne
ED Dolan
G Pesant
JOR Aoga
JOR Aoga
KY Huang
L Kotthoff
L Kotthoff
P Schaus
Q Yang
R Agrawal
R Rawassizadeh
R Rawassizadeh
S Nijssen
T Guns
W Zhou
Z Yang
Publication venue: Springer
Publication date: 01/01/2018
Field of study

Crossref

PolyPublie

Rare pattern mining: challenges and future perspectives

Author: A Inokuchi
A Kemmar
A Marascu
B Xu
C Berberidis
C Chen
C Giannella
C Raïssi
CC Aggarwal
CC Yu
CI Ezeife
CK Selvi
CKS Leung
CS Hemalatha
D Zhou
DWL Cheung
E Baralis
EM Elgaml
F Nori
F Rasheed
G Lee
H Albert-Lorincz
H Cao
H Yun
HF Li
HT Lam
J Ge
J Ge
J Han
J Han
J Han
J Li
J Pei
J Pillai
J Yang
J Zhu
JH Chang
K Kaneiwa
K Wang
KS Sadhasivam
KV Bhaskar
KY Huang
L Hui
M Adnan
M Adnan
M Capelle
M Deypir
M Deypir
M Garofalakis
M Kuramochi
MA Nishi
MG Elfeky
N Talukder
O Adam
P Tzvetkov
PS Tsai
R Agrawal
R Jindal
R Srikant
R Vijayalakshmi
RCW Wong
RJ Bayardo Jr
RU Kiran
SK Tanbeer
SK Tanbeer
SK Tanbeer
TP Hong
U Bhatt
U Yun
U Yun
U Yun
W Huo
WABWA Bakar
WC Peng
WG Aref
X Liu
X Liu
Y Aumann
Y Hirate
Y Huang
Y Ji
YC Lee
YL Chen
YL Chen
YS Koh
Z Zou
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref